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Abstract 

In  some  instances  group  comparisons  in  terms  of  upper 
or  lower  portions  of  the  score  distributions  are  more 
informative  than  comparisons  of  central  tendency. 
These  comparisons  can  be  done  by  carrying  out  a  split 
on  the  data  prior  to  an  analysis  of  variance  (ANOVA) . 
The  resulting  test  statistic  from  ANOVA  is  not 
distributed  as  an  F  ratio  however,  and  requires 
evaluation  for  significance  relative  to  an  empirical 
monte-carlo  distribution.  An  example  and  computer 
program  are  presented. 
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Splits  Analysis:  A  Method  for 
Noncentral  Tendency  Comparisons 

In  the  behavioral  sciences,  comparison  of  groups 
typically  concerns  contrast  of  central  tendency.  For 
example,  a  researcher  interested  in  the  effects  of 
violent  vs  nonviolent  tv  programs  would  typically 
compare  the  mean,  or  perhaps  the  median,  subsequent 
aggression  of  the  first  group  vs  the  second  group. 

There  are  instances,  however  when  researchers  would  be 
interested  not  with  central  tendency  differences; 
rather  they  would  be  concerned  with  differences  in  say 
the  upper  ten  percent  or  the  lower  third  of  each  group. 
For  example,  an  industrial  psychologist  may  want  to 
investigate  the  efficacy  of  two  training  techniques  in 
resultant  skill  acquisition.  Since  only  the  top  ten 
percent  of  trainees  may  be  hired  or  promoted,  the  two 
techniques  would  be  best  evaluated  in  terms  of  their 
effect  on  the  upper  ten  percent  of  each  group. 

Lunneborg  (1986)  has  described  a  bootstrap  quantile 
analysis  appropriate  for  comparing  two  groups  at  given 
percentiles.  His  procedure  yields  a  probability  value 
that  the  two  groups'  scores  at  a  given  percentile 
differ  by  chance.  The  present  work  describes  an 
alternative  method  to  Lunneborg 's  bootstrap  procedure, 
and  provides  a  computer  implementation  of  the 
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procedure.  The  alternative  is  referred  to  as  a  splits 
analysis  as  it  concerns  carrying  out  a  split  on  the 
data  prior  to  statistical  analysis. 

Consider  a  data  set  for  which  we  are  interested  in 
comparing  the  upper  half  of  the  score  distributions  of 
two  groups.  To  carry  out  the  splits  analysis  the  data 
would  be  rank  ordered  within  groups,  a  median  split 
would  be  carried  out  on  each  group,  and  the  upper  half 
of  the  data  would  be  analyzed  using  a  one-way  analysis 
of  variance  (ANOVA) .  Although  an  ANOVA  is  carried  out 
on  the  data  the  resulting  test  statistic  is  not 
evaluated  using  standard  F  tables.  Research  by  the 
author  has  indicated  that  such  an  approach  would  lead 
to  a  great  inflation  in  the  Type  I  error  rate.  For 
example,  using  standard  F  distribution  critical  values 
typically  resulted  in  actual  Type  I  error  rates  in 
excess  of  .20  for  the  nominal  .05  significance  level 
(Rasmussen,  1990) . 

Instead  of  using  the  F  distribution,  monte  carlo 
methods  are  used  to  evaluate  the  significance  of  the 
obtained  test  statistic.  Specifically,  a  large  nu.t\ber, 
say  5000,  data  sets  of  the  same  sample  size  would  be 
generated  under  the  null  hypothesis  using  a  pseudo¬ 
random  normal  deviate  generator.  Each  of  the  data  sets 
would  be  processed  identically  as  the  original  data 
set,  i.e.,  an  empirical  test  score  distribution  under 
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the  null  hypothesis  for  the  same  sample  size  and 
data  split  would  be  created.  The  obtained  test 
statistic  is  then  evaluated  relative  to  this  empirical 
distribution  to  determine  significance.  For  example, 
if  10  of  the  monte  carlo  values  are  larger  than  the 
obtained  value  then  the  probability  value  associated 
with  the  obtained  value  would  be  10/5000  =  0.002. 

The  previous  example  woi:....d  be  roughly  analogous 
to  Lunneborg's  bootstrap  comparison  of  the  75th 
percentiles.  Initial  research  by  the  author  has 
indicated  that  the  splits  analysis  approach  maintained 
the  .01  and  .05  alpha  levels,  whereas  the  bootstrap 
procedure  tended  to  be  overly  conservative  (Rasmussen, 
1990)  . 

Table  1  presents  a  small  data  set  along  with  the 
results  from  a  splits  analysis.  In  the  example,  there 
are  9  cases  per  group  and  the  splits  analysis  compares 
the  lower  third  of  each  group.  The  ANOVA  test 


Insert  Table  1  about  Here 


statistic  resulting  from  the  splits  analysis  is  36.75. 
If  this  were  an  standard  F  ratio  (i.e,  with  1  and  4 
degrees  of  freedom)  it  would  have  a  probability  value 
of  .0037.  The  splits  analysis  probability  of  0.016  is 


less  extreme. 
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Similar  to  bootstrapping  and  approximate 
randomization  procedures  the  probability  value 
associated  with  splits  analysis  is  an  approximation 
that  depends  upon  the  number  of  monte  carlo  simulations 
and  the  significance  level  (Rasmussen,  1988;  Rasmussen, 
1989).  With  a  known  significance  level,  the  foirmula 
for  the  standard  error  is  ^  =  sqrt[s  (1  -  s)  /  m] , 
where  s  is  the  significance  level  and  m  is  the  number 
of  monte  carlo  simulations. 

This  formula  can  be  used  to  evaluate  the 
probability  that  a  given  approximate  probability  value 
is  less  than  a  desired  probability  value.  For  example 
the  probability  that  the  approximate  probability  value 
of  0.016  is  less  than  a  desired  probability  value  of 
0.05  can  be  calculated  from  SE  =  [.05  (1  -  .05)  /  5000] 
=  .00308.  Using  the  standard  score  formula, 
z  =  (.016  -  .05)  /  .00308  =  -11.04.  A  z  score  of  such 
magnitude  indicates  that  it  is  extremely  unlikely  that 
the  approximate  probability  value  of  0.016  is  greater 
than  the  0.05  level.  In  instances  in  which  the 
approximate  probability  value  is  close  to  the  desired 
value,  a  larger  number  of  simulations  could  be  carried 


out. 


Program  execution 
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The  program  asks  for  the  analysis  parameters 
interactively  and  reads  the  data  from  a  file.  The 
program  requires  the  sample  size  per  group  for  the 
entire  data  set  and  the  lower  and  upper  ordinal  values 
that  represent  the  desired  split.  For  example,  for  a 
sample  size  of  12  a  lower  value  of  l  and  an  upper  value 
of  3  would  compare  the  lower  quarter  of  the 
distributions,  whereas  a  lower  value  of  9  and  an  upper 
value  of  12  would  compare  the  upper  third  of  the 
distributions.  The  program  also  requests  the  number  of 
monte  carlo  simulations  to  carry  out.  On  the  VAX  the 
formula  to  estimate  the  execution  time  is  Central 
Processing  Unit  (CPU)  seconds  =  7.4E-5  (nm) ,  where  n  is 
the  sample  size  per  group  after  the  data  split.  For 
example,  with  a  sample  size  after  the  data  split  of  40 
per  group  and  with  20,000  repetitions  it  requires 
approximately  60  CPU  seconds.  Finally,  the  program 
requires  the  name  of  the  data  file.  The  data  is  read 
in  groups  using  free  format  with  one  score  per  record. 
The  program  then  carries  out  the  appropriate  split  on 
the  data  and  on  the  monte  carlo  simulations.  The  group 
means  on  the  split  data,  the  test  statistic  and  the 
probability  value  is  then  calculated  and  printed  out. 

Table  2  gives  the  FORTRAN  coding  of  the  splits 
analysis  along  with  an  efficient  ANOVA  function.  The 
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program  will  require  a  random  normal  deviate  generator 


Insert  Table  2  about  Here 


and  an  efficient  sorting  routine.  These  are  readily 
available  in  Lehman  (1977)  and  Miller  (1982)  or  can  be 
obtained  from  the  author. 

The  program  currently  runs  on  a  VAX  8800  computer. 
To  run  the  program  on  another  system  it  will  probably 
be  necessary  to  change  the  OPEN  statement  and  the  unit 
numbers  associated  with  the  READ  and  WRITE  statements. 
In  addition  the  SECNDS  and  RAN  functions  may  be 
different  on  other  systems.  The  SECNDS  function  is 
used  to  give  a  different  series  of  random  numbers  based 
on  the  time  in  seconds  since  midnight.  On  systems 
which  cannot  readily  provide  a  function  to  give  the 
time  the  program  can  be  modified  to  ask  the  user  for  a 
seed  (e.g. ,  a  random  nine  digit  odd  number)  to  start 
the  random  number  generator. 
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Table  1 

Sample  splits  analysis 


Group  1 
13 
16 
19 


24 

19 

32 

36 

37 


Group  2 
28 
29 
33 


34 

38 

40 

41 
43 


Mean  1  Mean  2 

16.0  30.0 

Test  Statistic:  36.75 
Probability:  0.016 


42 


45 
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Table  2 

Source  Code  for  Splits  Analysis 


REALX(IOOO),  Y(IOOO),  FMC(IOOOOO) 

CHARACTER  IFILE*20 
XXX  =1.0 

III  =  SECNDS(XXX)  *  2000  +  1 
WRITE  (6,19) 

19  FORMAT('  This  program  calculates  probability  values'/ 

1  '  for  splits  on  data.  Give  tne  sample  size  per  group, '/ 

1  '  upper  and  lower  split  values,  and  number  of  monte'/ 

1  *  carlo  trials.  */) 

READ  (6,*)  NPERG,  ISPLTL,  ISPLTU,  NMC 
XNMC  =  NMC 

NSPLT  =  ISPLTU  -  ISPLTL  +  1 
XNSPLT  =  NSPLT 
WRITE  (6,29) 

29  FORMAT('  Give  the  name  of  the  data  file  '/) 

READ  (6,39)  IFILE 

39  FORMAT (A20) 

OPEN  (27,  FILE  =  IFILE,  STATUS  =  'UNKNOWN') 

DO  10  I  =  1,  NPERG 
READ  (27,*)  X(I) 

10  CONTINUE 

DO  20  I  =  1,  NPERG 
READ  (27,*)  y(I) 

20  CONTINUE 

CALL  SORT  (X,  NPERG) 

CALL  SORT  (Y,  NPERG) 

FOBS  =  ANOVA  (X,  Y,  TOTX,  TOTY,  ISPLTL,  ISPLTU,  NSPLT) 

XMEAN  =  TOTX  /  XNSPLT 

YMEAN  =  TOTY  /  XNSPLT 

DO  40  IREP  =  1,  NMC 

DO  30  I  =  1,  NPERG 

X(I)  =  RNORM(III) 

Y(I)  =  RNORM(III) 

30  CONTINUE 

CALL  SORT  (X,  NPERG) 

CALL  SORT  (Y,  NPERG) 

FMC(IREP)  =  ANOVA  (X,  Y,  TOTX,  TOTY,  ISPLTL,  ISPLTU,  NSPLT) 

40  CONTINUE 

CALL  SORT  (FMC,  NMC) 

ITST  =  0 

Table  2  cent ii.aes 
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Table  2,  continued 

Source  Code  for  Splits  Analysis 


DO  50  IREP  =  1,  NMC 

IF  (FOBS  .LT.  FMC(IREP))  GOTO  51 

ITST  =  ITST  +  1 

50  CONTINUE 

51  CONTINUE 

XTST  =  ITST 

PROB  =  (XNMC  -  XTST)  /  XNMC 
WRITE  (6,49)  XMEAN,  YMEAN,  FOBS,  PROB 
49  FORMAT (//'  Means : ' , 2F12 . 4/ 

1  '  Test  Statistic: • ,F12. 4/ '  Probability F12 . 4// ) 

STOP 
END 
C 

FUNCTION  ANOVA  (X,  Y,  TOTl,  TOT2 ,  ISPLTL,  ISPLTU,  N) 
REAL  X(IOOO) ,  Y(IOOO) 

XN  =  N 

XNTOT  =  XN  *  2.0 
DFW  =  2.0  *  (XN  -  1.0) 

TOTl  =  0 
TOT2  =  0 
SXSQ  =  0 

DO  10  I  =  ISPLTL,  ISPLTU 
TOTl  =  TOTl  +  X(I) 

TOT2  =  TOT2  +  Y(I) 

SXSQ  =  SXSQ  +  X(I)**2  +  Y(I)**2 
10  CONTINUE 

TOTV  =  (TOTl**2  +  TOT2**2)  /  XN 

CF  =  (TOTl  +  TOT2)**2  /  XNTOT 

SSB  =  TOTV  -  CF 

SSW  =  SXSQ  -  TOTV 

VMSW  =  SSW  /  DFW 

ANOVA  =0.0 

IF  (VMSW  .GT.  0.0)  ANOVA  =  SSB  /  VMSW 

RETURN 

END 


